A Hierarchical CLH Queue Lock

نویسندگان

Victor Luchangco

Daniel Nussbaum

Nir Shavit

چکیده

Modern multiprocessor architectures such as CC-NUMAmachines or CMPs have nonuniform communication architectures that render programs sensitive to memory access locality. A recent paper by Radović and Hagersten shows that performance gains can be obtained by developing general-purpose mutual-exclusion locks that encourage threads with high mutual memory locality to acquire the lock consecutively, thus reducing the overall cost due to cache misses. Radović and Hagersten present the first such hierarchical locks. Unfortunately, their locks are backoff locks, which are known to incur higher cache miss rates than queue-based locks, suffer from various fundamental fairness issues, and are hard to tune so as to maximize locality of lock accesses. Extending queue-locking algorithms to be hierarchical requires that requests from threads with high mutual memory locality be consecutive in the queue. Until now, it was not clear that one could design such locks because collecting requests locally and moving them into a global queue seemingly requires a level of coordination whose cost would defeat the very purpose of hierarchical locking. This paper presents a hierarchical version of the Craig, Landin, and Hagersten CLH queue lock, which we call the HCLH queue lock. In this algorithm, threads build implicit local queues of waiting threads, splicing them into a global queue at the cost of only a single CAS operation. In a set of microbenchmarks run on a large scale multiprocessor machine and a state-of-the-art multi-threaded multi-core chip, the HLCH algorithm exhibits better performance and significantly better fairness than the hierarchical backoff locks of Radović and Hagersten.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RH Lock: A Scalable Hierarchical Spin Lock

Scalable architectures with non-uniform memory access time (NUMAs) have gained increased popularity in recent years. The increased scalability have increased the demand for scalable lock implementations, such as the queue-based locks of Mellor-Crummey and Scott (MCS lock), and of Craig, Landin and Hagersten (CLH lock). This paper demonstrates that the first-come first-served nature of queue-bas...

متن کامل

Lock cohorting: A general technique for designing NUMA locks Citation

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machines’ non-uniform memory and caching hierarchy, ever more important. This paper presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful. Lock cohorting allows one to transform any...

متن کامل

Topic 12: Theory and Algorithms for Parallel Computation - (Introduction)

Parallelism exists at all levels in computing systems from circuits to grids. Effective use of parallelism crucially relies on the availability of suitable models of computation for algorithm design and analysis, and on ecient strategies for the solution of key computational problems on prominent classes of platforms. The study of foundational and algorithmic issues has led to many important ad...

متن کامل

Starling: Lightweight Concurrency Verification with Views

Modern program logics have made it feasible to verify the most complex concurrent algorithms. However, many such logics are complex, and most lack automated tool support. We propose Starling, a new lightweight logic and automated tool for concurrency verification. Starling takes a proof outline written in an abstracted Hoare-logic style, and converts it into proof terms that can be discharged b...

متن کامل

Performance impact of run queue organization and synchronization on large-scale NUMA multiprocessor systems

The goal of this paper is to study the impact of run queue organization on the performance of synchronization methods in multiprocessor systems. Two run queue organizations are considered: distributed and hierarchical organizations. The performance impact of spinning and blocking synchronization methods on these two run queue organizations is studied. We use two canonical workload types that re...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

A Hierarchical CLH Queue Lock

نویسندگان

چکیده

منابع مشابه

RH Lock: A Scalable Hierarchical Spin Lock

Lock cohorting: A general technique for designing NUMA locks Citation

Topic 12: Theory and Algorithms for Parallel Computation - (Introduction)

Starling: Lightweight Concurrency Verification with Views

Performance impact of run queue organization and synchronization on large-scale NUMA multiprocessor systems

عنوان ژورنال:

اشتراک گذاری